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Abstract: A bivariate distribution with continuous margins can be uniquely de- 
composed via a copula and its marginal distributions. We consider the problem of 
estimating the copula function and adopt a Bayesian approach. On the space of 
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copula functions, we construct a finite dimensional approximation subspace which 
is parametrized by a doubly stochastic matrix. A major problem here is the selec- 

p-^ tion of a prior distribution on the space of doubly stochastic matrices also known 

as the Birkhoff polytope. The main contributions of this paper are the derivation 
j of a simple formula for the Jeffreys prior and showing that it is proper. It is known 

in the literature that for a complex problem like the one treated here, the above 
t/) results are difficult to obtain. The Bayes estimator resulting from the Jeffreys 

prior is then evaluated numerically via Markov chain Monte Carlo methodology. A 
rather extensive simulation experiment is carried out. In many cases, the results 
favour the Bayes estimator over frequentist estimators such as the standard kernel 

J — estimator and Deheuvels' estimator in terms of mean integrated squared error. 
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1. Introduction 

^ Copulas have received considerable attention over the last years because of 

^ their increasing use in multiple fields such as environmental studies, genetics, 

data networks and simulation. They are also currently one of the hot topics in 



quantitative finance and insurance, see for instance Genest et al. (2009). Since 
it is precisely the copula that holds the dependence structure among various 
random quantities, estimating a copula is part of many techniques employed 
in these fields. For instance, in Risk Measurement, the Value-at-Risk (VaR) is 
computed by simulating asset (log)returns from an estimated copula. Further 
financial examples in which copulas are estimated are provided in the books 
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written by Cherubini et al. (2004 1, and Trivedi and Zimmer (20051. In this 



paper, we provide new generic methodology for estimating copulas, and that, in 
a Bayesian framework. 

Let us first recall that a bivariate copula C is a distribution function on 
S = [0, 1] X [0, 1] with uniform margins. In particular, copulas are Lipschitz 
continuous and form an equicontinuous family. They are bounded by the so- 
called Frechet-Hoeffding copulas, that is 

max(0, u + V — 1) < C{u, v) < min(n, v), for all (n, v) £ S. 

Sklar's Theorem states that a bivariate distribution F is completely characterized 
by its marginal distributions Fx,Fy and its copula C. More precisely, we have 
the representation 



= CiFxix),FY{y)), for all {x,y) G 



(1.1) 



where C is well defined on Ran(i^x) x Ran(Fy), see Nelsen (1999|. In particular. 



the copula is unique if Fx and Fy are continuous, and in this case, we have the 
following expression for the copula 



C{u,v) = F{F^\u),Fy\v)), for ah {u,v) G S. 



(1.2) 



Let {{xi,yi),i = 1, . . . , n} be a sample, where every {xi,yi) is a realization 
of the random couple (Xi,Yi), i = 1, . . . ,n, with joint cumulative distribution 
function F, and continuous marginal distributions Fx and Fy- We consider 
the problem of estimating the copula C by a copula C, where C depends on 
the sample. In this problem, the individual marginal distributions are treated 
as nuisance parameters. The literature presents three generic approaches for 
estimating C, namely the fully parametric, the semiparametric and the nonpara- 
metric approaches. Below, we briefly describe each approach and emphasize on 
two nonpar ametric estimators, since we will subsequently compare our estimator 
with these. 



The fully parametric approach. When parametric models for both the marginal 
distributions Fx and Fy and for the copula function C are specified, the like- 
lihood of the sample {{xi, yi): i = 1, . . . , n} is computed via equation ( 1.1 ). In 
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principle, estimates can be jointly obtained for the marginal distribution param- 
eters and for the copula. However, when joint estimation is computationally 



difficult, Joe (1997) proposes a two-step method in which the marginal distri- 



butions are estimated in a first stage and then plugged-in thereafter as the true 
margins, enabling the estimation of the copula function in a second step. This 
approach is called Inference for Margins (IFM). The asymptotic efficiency of 



IFM is discussed in Joe (20051 by considering maximum likelihood estimates at 



both stages of the procedure. In a Bayesian setup, Silva and Lopes (2008) argue 



that under a deviance-based model selection criteria, the joint estimation of the 
marginal parameters and of the copula parameters is better than the two-step 
procedure. 

The semiparametric approach. Here a parametric model is assumed for the cop- 



ula function C while the margins are kept unspecified. In this setup, Genest 



et al. (19951 have proposed to use n/(n -|- 1) times the empirical distributions as 
the estimates Fx and Fy and a pseudo-likelihood estimator for C. The authors 
show that the resulting estimator is consistent and asymptotically normal. In 



Kim et al. (2007), comparisons are made between the fully parametric approach 



and the semiparametric approach proposed by Genest et al. (1995). More re- 



cently, in a Bayesian setup, Hoff (20081 proposes a general estimation procedure. 



via a likelihood based on ranks, that does not depend on any parameters describ- 
ing the marginal distributions. The latter methodology can accommodate both 
continuous and discrete data. 



The nonparametric approach. This approach exploits equation (1.2). Here we 



describe Deheuvels' estimator and the kernel estimator. Let F be the empirical 
cumulative distribution function and let F^^ and Fy^ be the generalized inverses. 
We say that C satisfies the Deheuvels constraint provided that for all i,j = 
1, . . . ,n, 

C{t/n,j/n) = F{F^\i/n),Fy\j/n)), 

n 

= (1/n) l(rank(xfc) < i,rank(yyt) <j). (Deheuvels' constraint) 

k=l 
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In Deheuvels ( 1979 1 , the asymptotic behaviour of the class of copulas C satisfying 



the Deheuvels constraint is described. Note that C is sometimes called empirical 



copula in the literature, see Nelsen ( 1999 1 for instance. We propose an estimator 



that satisfies Deheuvels' constraint in Lemma |3] which we call Deheuvels' esti- 
mator henceforth. One nice property of this estimator is its invariance under 
strictly increasing transformations of the margins. In other words, if / and g are 
two strictly increasing functions, then Deheuvels' estimator based on the original 
sample and the one based on the sample {{f{xi),g{yi)) : i = 1, . . . ,n} are iden- 
tical. This is a desirable property for a copula estimator since it is inherent to 
copulas themselves. 

In general, if is a smooth kernel estimator of F {Fx and Fy are continuous 
say), then 

C{u,v) = F (^F^^ (u) , Fy^ (v)^ , for all {u,v) S S, (kernel estimator) 
is called a kernel estimator for C. Asymptotic properties of Gaussian kernel esti- 



mators are discussed in Fermanian and Scaillet (|2003|) , and the reader is referred 
to 



Charpentier et al. (2006) for a recent review. 



Although both of the nonparametric estimators discussed above have good 
asymptotic properties, it is not the case for finite samples in general. In fact, 
these estimators give poor results for many types of dependency structures which 
is illustrated in Section 5. This could be a considerable inconvenience for prac- 
titioners working with small samples. 

Our aim is to develop a Bayesian alternative for the estimation of C which 



circumvents this problem. Following Genest et al. (1995), the marginal distribu- 



tions are kept unspecified when these are unknown, and we use n/{n + l) times 
the empirical distributions as their estimates. In view of this, our methodol- 
ogy can be called empirical Bayes. When the marginal distributions are known, 
they are transformed into uniform distributions, and in this case our procedure is 
purely Bayesian. In both cases, our estimator has the property of being invariant 
under monotone transformations of the margins, just like Deheuvels' estimator. 

Essentially, our model is obtained as follows. First, in Section 2 we construct 
an approximation subspace £/ C where is the space of all copulas. This is 
achieved by considering a norm || • || and setting a precision e > so that for every 
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copula C G 'rf there exists a copula AGs/ such that ||C — A|| < e. Moreover, £/ 
is finite dimensional, it is parametrized by a doubly stochastic matrix P. Then, 
C is obtained by concentrating a prior on s^, and by computing the posterior 
mean, that is the Bayes estimator under squared error loss. Now two problems 
arise, the first one is the prior selection on £/ and the second one concerns the 
numerical evaluation of the Bayes estimator. These are the topics of Sections 3 
and 4 respectively. While the problem of evaluating the Bayes estimator is solved 
using a Metropolis-within-Gibbs algorithm, the choice of the prior distribution 
is a much more delicate problem. A copula from our model can be written as 
a finite mixture of distributions. The mixing weights form a matrix W which is 
proportional to a doubly stochastic matrix. Therefore specifying a prior on £/ 
boils down to specifying a prior for the mixing weights. We assume that we do 
not have any information that we could use for the construction of a subjective 
prior. Also, it is not our intention to obtain a Bayes estimator better than some 
given other estimator. For these reasons we shall rely on an objective prior, and 
a natural candidate is the Jeffreys prior. The main contributions of our paper are 
the derivation of a simple expression for the Jeffreys prior, and showing that it 
is proper. The fact that these results are generally difficult to come up with, for 
finite mixture problems, has been raised before in the literature, see for instance 



Titterington et al. (19851 and Bernardo and Giron (19881. Moreover, here we face 



the additional difficulty that the mixing weights are further constrained, since 
their sum is fixed along the rows and the columns of VF. To the best of our knowl- 
edge, nothing has yet been published for this problem. Finally, in Section 5, we 
report results of an extensive simulation in which we compare our estimator with 
Deheuvels' estimator and the Gaussian kernel estimator. Fortunately, in many 
cases, the results favour the Bayes estimator over these frequentist estimators in 
terms of mean integrated squared error. 



2. The model for the copula function 

For every m > 1, we construct a finite dimensional approximation subspace 
£ifjn C 'if. The construction of £/m uses a basis which forms a partition of unity. 
The elements of s/m are parametrized by a doubly stochastic matrix P. The 
choice of the basis is fixed while P varies. The representation is given in expres- 
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sion (2.3). Furthermore, we give upper bounds on 



sup inf \\A — C\\oo- 

A partition of unity is a set of nonnegative functions if = defined 
over the unit interval [0, 1], such that rmpi is a density for all i = 1, . . . , m, and 

m 

<Pi{u) = 1, for all u G [0, 1]. 

i=l 

Particular examples are given by indicator functions 

V'l = l[0,l/m]> V'i = l((i-l)/m,i/m]; i = 2, . . . , m, (2.1) 

and Bernstein polynomials 

ipi = B^_-i\ i = l,...,m, (2.2) 

where 

B^{u) = - ur~\ for aU u G [0, 1]. 

See Li et al. ( 1998[ ) for more examples of partitions of unity. In the following, let 
$ = ($1, . . . ,$m)', where ^i{u) = ipiit) dt, for all u e [0,1], i = 1, . . . ,m and 
let 

Ap{u,v) = m<P{uyP<^{v), for ah {u,v) G S, (2.3) 

where P is an m x m doubly stochastic matrix. The following Lemma is straight- 
forward to prove. 

Lemma 1. For every doubly stochastic matrix P, Ap is an absolutely continuous 
copula. 

In view of the above result, we define the approximation space as 

£/rn = {Ap : P is a doubly stochastic matrix}. 

The approximation order of £/m is now discussed, it depends on the choice of the 
basis Let = {{i/m, j /m) : i, j = 1, . . . , m}, be a uniformly spaced grid on 
the unit square S. For a given copula C, let Rc = {C{i/m,j/m))^j^-^ be the 
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restriction of C on • Let 
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then Pc = mDRcD' is a doubly stochastic matrix. Upper bounds for \\Ap^ 
C\\oo are given in the following Lemma. 

Lemma 2. Let C he a copula and let A = Ap^ G 



(a) For a model using indicator functions basis (2.1), we have Ra = Rc o,nd 
\\A - C||oo < 2/m. 



(h) For a model using the Bernstein basis (2.2), we have \\A — C\\oo < l/y/rn. 



Proof, (a) A direct evaluation shows that Ra = Rc- From the Lipschitz con- 
dition, if two copulas Ci and C2 satisfy the contraint fici = RC21 then \\Ci — 
C2II00 < 2/m. 



(5) First, it is well known that m^'D = {B'^, . . . , B^). For any {u,v) £ S 
consider X and Y two independent random variables, X has a binomial(m, u) 
distribution and Y has a binomial(m, v) distribution. Let 9 = {u,v). We have 



Therefore, 



A{e) = ¥.0[C{X/m,Y/m)\. 



snv\A{e) -C{e)\ = snv\^e[C{X/m,Y/m) -C{u,v)]\, 
ees e<=s 

]e[\C{X/m,Y/m) - C{u,v)\], 



< sup Eg 

ees 

< sup Eg[\X / m — u\ + \Y/m — v\], 
ees 

= (2/m) sup Em[|X — mtt|]. 

m6(0,1) 
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In Lemma [5]of the Appendix, we give the exact value of sup„g(o,i) — mu]]. 

However, a simple expression for an upper bound is given by Holder's inequality 

sup 2Eu[\X — mu\]/m < sup 2^Var„[X]/m, 
«e(o,i) «e{o,i) 

= l/\/rn. 

□ 

Bernstein copulas have appeared in the past literature and their properties 



have been extensively studied in Sancetta and Satchell (2004 1 and Sancetta and 



Satchell (20011. However, in view of Lemma |2] and of the simplicity of indicator 



functions, we subsequently use the indicator functions basis given in (2.1 1 for ^ 
in our model. Notice that in this situation, if {{Ui, Vi)}^^^ denotes a random 
sample of size n, then 



'^ipi{Uk)(pj{Vk) i,j = l,...,m, 



k=l 



is a sufficient statistic with a multinomial(?7-, m~"^P) distribution. So there is a 
connection between our problem and the problem of estimating the probabilities 
in a multinomial setup when the probabilities live in a constrained parameter 
space. 

The following Lemma is used to define what we call Deheuvels' estimator. 

Lemma 3. Let {(xj, yi): i = 1, . . . ,n} be a sample, and let R = {vij) he the nxn 
matrix given by 



(1/n) ^ l(rank(xA;) < z,rank(yfc) < j), for i,j 



1, . . . ,n. 



k=l 



If we use the indicator basis (2.1) with m = n for <I>, then the copula 

Cdeh = n^^' DRD'^, (Deheuvels' estimator) 
satisfies Deheuvels' constraint. 



3. The prior distribution 

The choice of a prior concentrated on the approximation space is delicate. 
The prior distribution is specified on the set of doubly stochastic matrices 
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of order m, m > 1. Here, we adopt an objective point of view and derive the 
Jeffreys prior. We also discuss two representations of doubly stochastic matrices 
that can be useful for the specification of other prior distributions on i^. 

The set is a convex polytope of dimension (m — 1)^. It is known in the 
literature as the Birkhoff polytope and has been the object of much research 
in the past years. For instance, computing the exact value of its volume is an 



outstanding problem in mathematics, it is known only for m < 10, see Beck and 



Pixton (20031. 



The Fisher information matrix is obtained as follows. For m > 1, let P £ 
and let W = (l/m)^. The copula (2.3) is a mixture of bivariate distribution 
functions 



Ap{u,v) = m^{u)'P^{v) 

m m 
i=l j=l 



where W = {l/m)P G W, and ^'^(ti) = ^i{t)dt, for ah u E [0,1], with 
V'i(-) = nnpi{-),i = 1, . . . ,m. The density op of Ap is thus 



ap{u,v) = ^^Wij'4!i{u)'iljj{v), 
i=i j=i 

m m 

i=i j=i 

m—1 m—1 

1=1 j=l 



The last equality expresses the fact that there are (m — 1)^ free parameters in the 
model. By considering the indicator functions basis ( |2.1[ ), for all «i,ji,«2)i2 = 
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1, 

E 



,m — 1 
^2 



-d log ap{u, v) 



1 

^0 



ap{u,v) 
mrm 

1/ Winmi 



dudv, 



Although the information matrix is of order (m — 1) 



X [m 
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, if n = «2, ji = j2, 

if n = «2, ji / j2, 

if n / «2, ji = j2, 

if n / «2, ji / j2- 

the following 



result shows how to reduce the computation of its determinant to that of a 



matrix of order (m — 1) x (m — 1). The important reduction provided by (3.1) 
is greatly appreciated when running an MCMC algorithm which computes the 
determinant at every iteration. Most importantly, this expression enables us to 
derive the main result of this paper, that is Theorem [T| The proofs of these two 
results are quite technical, so we have put them in the Appendix. 

Lemma 4. The Fisher information for W = {wij)ij=i^„,^rn €z W is given by 

det{{l/m)I -mV'V) 



I{W) 



m™ det Dq det Di 



(3.1) 



where 



and 



^ — (^ij)i=l,...,my'=l,...,m— 1) 

Do = diag(ri;ii, . . . 

l)(m-l)): 



Di = diag{Wmm, Wim, ■ ■ • , U)^rn-l)m, '"'ml, • • • , Wm{m-1))- 

Theorem 1. The Jeffreys prior vr oc I^^'^ is proper. 

Now, in order to specify different priors we can consider the two following 
representations . 



The Hilbert space representation. Let = {P ~ (l/m)ll' : P S i^} and 
f = Span(=^o)- Consider the inner product <Vi,V2>= tr(yLF2) on Thus, Y 
is an (m — 1)^ dimensional Hilbert space and an orthonormal basis is given by 

{ViVj}ij=i^,„^rn-l, with 

1 . N/ 

Vi = (l,...,l,-z,0,...,0) , z = l,...,m- 1. 
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Now, for all P G there exists a unique (m — 1) x (m — 1) matrix a such that 

P = m-^ 11' + GaG' , (3.2) 

where G is the m x (m — 1) matrix given by G = {vi, V2, ■ ■ ■ ,fm-i)- In this 
representation a = G'PG. Therefore, if we let = G',^G, then we have a 
bijection between ^ and . The set is a bounded convex subset of M^™^-*^)^ 
with positive Lebesgue measure. From this, priors on ^ can be induced by priors 
on , and later on, we shall refer to the uniform prior on the polytope ^ as the 
uniform distribution on The above representation is also particularly useful 
to construct a Gibbs sampler for distributions on the polytope. 



The Birkhoff-von Neumann representation. Another decomposition is obtained 
by making use of the Birkhoff-von Neumann Theorem. Doubly stochastic ma- 
trices can be decomposed via convex combinations of permutation matrices. In 
fact, ^ is the convex hull of the permutation matrices and these are precisely the 
extreme points (or vertices) of Furthermore, every m x m doubly stochastic 
matrix P is a convex combination of at most k = (m — 1)^ -|- 1 permutation 



matrices, see Mirsky (1963). In other words, if {(Ti}[!!;]^ is the set of permuta- 



tion matrices and if P G then there exists 1 < ii < • • • < < m! such 
that P = X]j=i ^ij'^iji some weight vector (Aj^, . . . , Aj^.) lying in the {k — 1)- 
simplex = {(Ai, . . . , A^) : < \j, for all j and X]j=i — ^ prior distri- 
bution over the polytope can be selected using a discrete distribution over the 
set {1 < ii < • • • < ife < m !} and a continuous distribution over the simplex A^,, 



such as a Dirichlet distribution. See also Melilli and Petris (19951 for work in 
this direction. 

4. The MCMC algorithm 

Let {(xi,yi),i = l,...,n} be a sample, where each (xi,yi) is a realization 
of the random couple (Xi,Yi), i = l,...,n, with dependence structure given 
by a copula C, and with continuous marginal distributions Fx and Fy. If the 
marginal distributions are known, then the transformed observations Xi = Fx{xi) 
and yi = Fyiyi)., i = 1, . . . ,n, are both samples from a uniform distribution on 



(0, 1). If the marginal distributions are unknown, then we follow Genest et al. 



(19951 and consider the pseudo-observations Xi = {n/{n + l))Fx[xi) and yj 
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(n/(n + l))Fy (yj), i = 1, . . . , n, where Fx and Fy are the empirical distributions. 
The algorithm below describes the transition kernel for the Markov chain used 
to numerically evaluate the Bayesian estimator C associated to the Jeffreys prior 
vr. The type of algorithm is called Metropolis-within-Gibbs, see for instance 



Gamerman and Lopes (20061. An individual estimate is approximated by the 



sampling mean of the chain. 

Let T > 1 be the length of the chain, and at each iteration t, 1 <t <T, let Pt 



be the current doubly stochastic matrix. From representation (3.2) in the previous 
section, 

Pt-{l/m)ll'= ^ ^afci^;^^;,'. 

k=l 1=1 

Repeat for j = 1, . . . , m — 1: 

1. Select direction ViVj and compute the interval Tij C M as follows: 
1.1 For every p,q = 1, . . . ,m, find the largest interval T\f^^ such that 



m—1 m—1 

i^Kf > -l/m - ^ ^ c^kivi!'^vl'\ for all 7., G rjf \ 

k=l 1=1 



1.2 Taker,, =n,,,r(f^ 



2. Draw from the uniform distribution on Tij, and set f3ij = aij + 7ij and 
Pki = OLku for every k ^ i,l ^ j . The proposed doubly stochastic matrix is given 
by 



m—1 m—1 



k=l 1=1 



3. Accept Pt+i = P^ with probability 



a{PuP, )-uiin^l, }, (4.1) 



where L{- \ x,y) is the likelihood derived from expression (2.3). 

Note that the above algorithm could also be used with any prior specified 
via the Hilbert space representation described in the previous section, including 
the uniform prior on the polytope SS. In particular, it could be adapted to 



BAYESIAN ESTIMATION OF COPULAS USING THE JEFFREYS PRIOR 



13 



draw random doubly stochastic matrices according to such priors by replacing 



the acceptance probability (4.1) with 



a(Pi,Pr^)=min|l, 



In order to further describe the Jeffreys prior, we use the algorithm to approx- 
imate the probability of the largest ball contained in This ball has radius 
l/(m — 1), where m > 1 is the size of the doubly stochastic matrix. Although 
this probability can be obtained exactly for the uniform distribution, we never- 
theless approximate it using our algorithm, meanwhile providing some validation 



of the MCMC algorithm. Figure 4.1 shows the results we get for m = 4. 

Notice that this probability is much smaller for the Jeffreys prior, because it 
distributes more mass towards the extremities of the polytope than the uniform 
prior does. This may also be observed by plotting the density estimates of the 
radius of the doubly stochastic matrix, that is the ^2-distance of the doubly 
stochastic matrix from the centre of the polytope These are shown in Figure 



4. Simulation experiments 

The goal of the experiment is to study the performance of our estimator 
on artificial data sets generated from various known bivariate distributions. We 
provide evidence that the Bayesian estimator gives good results in general, or in 
other words, that the Jeffreys prior is a reasonable choice. 

For every data set, the copula function is estimated. Three different depen- 
dence structures are considered, the first one is the Gaussian copula 

Cp{u,v) = <!>p{<!>~\u),<!>-Hv)), \p\<l, 

where $p is the standard bivariate Gaussian cdf with correlation coefficient p and 
^ is the univariate standard normal distribution. See Figure [43] ^ a). The second 
dependence structure that we consider is obtained by the following: let ([/, V) 
be a random vector with uniform margins with joint distribution Cp, let W be 
an independent uniformly distributed random variable and consider the random 
vector {Uc,Vc) = {U,V)1{W < 1/2) + (^7, 1 - V)1{W > 1/2). The distribution 
of {Uc, Vc) is given by 

Cp,c{u,v) = l/2{Cp{u,v)-Cp{u,l-v)+u), for ah {u,v) e S. 
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Iterations 

(a) Uniform prior 



5 



4 




Iterations 

(b) Jeffreys prior 

Figure 4.1: Convergence of 1000 parallel MCMC runs for the probability of the largest 
ball contained in the polytope ^ with m = 4. Shaded region represents the range 
of the entire set of approximations at each iteration. Above is the convergence for the 
probability in the case of the uniform distribution. The flat line, in this case, corresponds 
to the true probability p « 0.0027. Below is the same for the Jeffreys prior. 
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Sample 



(b) 



Density estimate 




0.01 0.012 0.014 0.016 



(c) 



Sample 



(d) 



density estimate 



Figure 4.2: Plots of samples and density estimates of the radius (.if2-distancc of the 
doubly stochastic matrix from the center of the polytope on the interval [0,^95], 
where 995 is the 95*'' quantile of its distribution. Above are results when sampling from 
the uniform prior and below from the Jeffreys prior. Here m = 4. 
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Here, the index c is to highlight the "cross like" dependence structure, see Figure 
|4.3[ b). A "diamond like" dependence structure is also considered, this is obtained 
by the transformation {Ud,Vd) = {Uc + 1/2 (mod 1),K) which is distributed 
according to the copula 






(a) Gaussian 








(b) Cross (c) Diamond 

Figure 4.3: Gaussian copula with p = 0.5, and corresponding cross and diamond depen- 
dence structures. 

An extensive simulation experiment is carried out in two parts. In the first 
part, we consider the case of known marginal distributions and use bivariate data 
sampled from the copula models above. In the second part of the experiment, we 
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simulate an unknown margins situation. The data sets are generated from the 
same copulas, but a Student t with 7 degrees of freedom and a chi-square with 4 
degrees of freedom are considered as the first and second margins respectively. 

In the experiment, 1 000 samples of both sizes n = 30 and n = 100 are gener- 
ated from each model. The Bayesian estimators from the Jeffreys prior and the 
uniform prior over the polytope are evaluated. For the uniform distribution, we 
take a uniform distribution on J^', a subset of M('"'l)^ and we use the bijection 
^ = m~^ll' + G.^'G' given in expression (3.2). The maximum likelihood esti- 



mator for P in copula (2.3) is evaluated numerically. In every case, the order of 
the doubly stochastic matrix in our model is given by m = 6. We compare the 
performance of our estimators with Deheuvels' estimator given in Theorem |3] and 
the Gaussian kernel estimator. The results we obtain in the first and second part 
of the simulation are given in Figures |4.4| and |4.5| respectively. These Figures 
report the values of the mean integrated squared errors 



MISE(C) = E 



1 ^ 

{C{u, v) — C{u, v))'^dudv 



10 Jo 

for the five estimators as a function of the parameter p, for < p < 1. 

As the results indicate, the Bayesian approach generally outperforms De- 
heuvels' estimator and the kernel estimator. Unfortunately, this is not the case 
when the true model is the Gaussian copula Cp, for large values of p. As p 
increases to 1, the true copula approaches the Frechet-Hoeffding upper bound, 
also called the comonotone copula, corresponding to (almost sure) perfect pos- 
itive linear dependence. Notice that for the Bayes estimators, the MLE and 
Deheuvels' estimator, the invariance property mentioned in the introduction is 
reflected in the results since for each model, their MISE curves in both the known 
and the unknown margins cases are very similar. This is less the case for the ker- 
nel estimator. Notice also the resemblance in shape of the MISE for Deheuvels' 
estimator and the kernel estimator in the unknown margins cases. Finally the 
performance of the MLE is worth mentioning, since in many cases it has the 
smallest MISE, especially for large values of p. This is explained because the 
MLE will go on the boundary of the parameter space easily, while the Bayes 
estimator will always stay away from the boundary with the type of priors that 
we have selected. However, if such an extreme case is to happen in a real life 
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Figure 4.4: Plots of MISE against p in the known margins case. J: Bayes estimator 
using the Jeffreys prior, U: Bayes estimator using the uniform prior, MLE: maximum 
likelihood from our model, D: Deheuvels' estimator, K: kernel estimator. Here m = 6. 
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Figure 4.5: Plots of MISE against p in the unknown margins case. J: Bayes estimator 
using the Jeffreys prior, U: Bayes estimator using the uniform prior, MLE: maximum 
likelihood from our model, D: Deheuvels' estimator, K: kernel estimator. Here m = 6. 
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problem, it is probable that the practitioner has some insight on the phenomenon 
beforehand, and may choose to work with a more appropriate (subjective) prior. 

5. Discussion 

Two points need to be further discussed. First, our methodology is purely 
Bayesian only when the marginal distributions are known. When these are un- 
known, our methodology is empirical Bayes. In fact, in this case we propose a 
two-step procedure by first estimating the margins via the empirical marginal 
distributions and then plugging them in as the true distributions thereafter. We 



have chosen to do this because it is common practice to do so, see Genest et al. 



(19951, because it is simple to implement, and because our estimator is conse- 
quently invariant under increasing transformations of the margins. One way to 
propose a purely Bayesian estimator by using our model for the copula, is to 
use finite mixtures for the margins. This way, if the densities used in the latter 
mixtures have disjoint supports, then the Jeffreys prior for the mixing weights 



has a simple form and is proper, see Bernardo and Giron (19881. Now by select- 



ing independent Jeffreys priors for the margins and for the copula, the resulting 
prior is proper as well. 

Finally, our models given by the approximation spaces s^m, m > 1 are 



called sieves by some authors, see Grenander (1981 1 . Using these sieves we 



can construct a nonparametric model for the copula which can, in some sense, 
respect the infinite-dimensional nature of the copula functions. In fact, if we take 
= Dmyi-^m, then £/ is dense in the space of copulas. Our Bayesian method- 
ology can be easily adapted here. This can be achieved by selecting an infinite 
support prior for the model index m, and by using our methodology inside each 
model. The Bayesian estimator becomes an infinite mixture of the estimators 
proposed in this paper (one for each model m), where the mixing weights are 
given by the posterior probabilities of the models. 

Acknowledgements 

We wish to thank Daniel Stubbs and the staff at Reseau Quebecois de Calcul 
Haute Performance (RQCHP) for their valuable help with high performance com- 
puting. This research was supported by the Natural Sciences and Engineering 
Research Council of Canada (NSERC). 



BAYESIAN ESTIMATION OF COPULAS USING THE JEFFREYS PRIOR 



21 



Appendix 

Lemma 5. Consider X, a binomial{n,p) random variable. We have 

1/5(1/2, (n + l)/2), if n is odd, 

sup E[|X— npl] = { / \ n/2 / X 

o<p<i I h-(ra+l)-2j M + (n + l)-2ji/^(i/2,n/2), if n is even. 

Proof. Let gnip) = E[|X — np|]. We have 

= (M)!(n-1-L„pj)! ''^"'^"<' - ^'""^"'^ for all „ > l,p . [0, 1], 
where [_x\ = sup{n: n < a;,n is an integer} for all x. Therefore, 
sup gn{p) = max sup gn{p), 

0<p<l 0<fc<n-l{p. 



max o„ . 

o<fc<n-i \n + lJ 



First of all, supo<p<i ffitp) = 51(1/2) = 1/2 = 1/5(1/2,1). Assume that n > 1. 
Let hn{k) = gn (^) /gni^) , for A; = 0, . . . , n - 2. We have 



fc+2 



fi + {k + i)-A 

hn(k) = — ^ r and hn(k) = - — for k = 0,. . . , n—2. 

(l + (n-fc-l)-i)"-' Kin-2-k) 



However, 



log ( 1 + T ) = log (^1 + -j - T < for all t > 1. 



dt °\ tJ °V tJ t 

This implies that /i„ decreases on {0, . . . , n — 2}. Therefore, 



9i 



3ni—^^ < ■■ ■ < ffnf ^""'"/y^ ) > • • • > ffnf^rr) n is odd, 

\n+l/ \ n + 1 / \n+lJ 

/ 1 \ / n/2 \ /n/2 + l\ f ^ \ 

\^T^) < ■■■ < 9n[^—) = gn[ n~ > ^S-n — — r if nis eveu. 

Vn + 1/ Vn+1/ V n + 1 / Vn+1/ 



The final expression is obtained using the following identity: 

'n + l\ i^fV 



„,^2"rg.i)r(^)/r(i)taa„>o. 



□ 
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Proof of Lemma|4| Here we show how to compute I{W) = det ( E 



-9^ log ap{u,v) 



efficiently. First, notice that A = [ E 
Dq^ + CD^^C, where 



-d log ap{u,v) 



can be written as A 



and 



Dq - diag(u;ii, . . . • • • ,'W(m-l)l) • • • > ■"^(m-l)(m-l))) 

Di = diag{Wmm, Wim, ■ ■ • , U!^rn-l)m, Wml, Wm{m-1)), 



^ Im— 1 Im— 1 01m— 1 ' ' ' Im—1 ^ 
Im— 1 01m— 1 Im— 1 ■ ■ ■ Im—1 
Im— 1 01m— 1 01m— 1 ' ' ' Im—1 



(m-l)2x(2m-l) 



\ lm-1 Olm-1 Olm-1 • • • Im-1 / 

Thus, det^ = det(i:)i + C"L'oC)/(det L>o det Di). If we let B = (wij)i,j=i,...,m-i, 
then since ~ ^/m, for all j = 1, . . . ,m, and YlY=i 



.j^iWij = 1/m for all 



z = 1, . . . , m. 



det(Z)i + C'DqC) = det 



1 -2(l/m-U'mm) I'B' I'B 
Di + C'DqC=\ B1 (1/m)/ B 

B'l B' (l/m)I 

By elementary row and column operations, we get 

/ 1 (l/m)l' (l/m)l' 

(l/m)l B 
\ (l/m)l B' (1/m)/ 

so that 

det{D, + C'D,C) = deti^^^'"^}^ ^^^^^^ ^ - (l/m2)l2(„_i)l',(^_,)^ , 

/ (l/m)(/-(l/m)ll') S-(l/m2)ll' \ 
5'-(l/m2)ll' (l/m)(/- (l/m)ll') J' 

= det ((l/m)(/ - (l/m)ll')) det {\{l/m){I - (l/m)ll'] 
- [B' - (l/m2)ll'][(l/m)(/ - (l/m)ll')]"^[5 - (l/m2)ll'] 
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Finally, det ((l/m)(I - (l/m)ll')) = (1/m)" and - (l/m)ll')]"^ = 

m(/ + 11'), thus 

det{Di + C'DqC) = (l/m)™det((l/m)/- mF'l/)), 
where V = (■Wij)i=i,...,m;j=i,...,m-i- 

Proof of Theorem [l| We prove that the Jeffreys prior is proper. Consider 
the following partition V, V = {Vi V2 ■ ■ ■ Vm-i), where each Vj is a vector, 
j = 1, ... ,m — 1. The matrix {l/m)I — mV'V is symmetric nonnegative semi- 
definite, so that by Hadamard's inequality, we have 

■m— 1 

det{{l/m)I -mV'V) < Yl{l/m - m\\Vjf), 

i=i 

m—l I 

= Jl 2m w^jWkj 

j=l \ l<i<k<m 

m—l 



Let = {uij G {1/2, 1} : i, j = 1, . . . , m, a+j = m/2 + 1, j = 1, . . . , m 
1 and ct+m = m./2}. For any W G y^, we have 



yJl{W) = yjdei{{l/m)l -mV'V)/ ^ 



n 



< v^2™-V 

< a/2™-!/ 



m 



E 



m—l 



l<j,„_i<fcm-i<m l<ji</ci<m j = l 

E ■■■ 



m—l 

l<im_i<fcm_i<m l<ii<fci<m ^ i=l 
m 



1/2 



1/2 





m 




n 








m 


^ 


n 







v'2^^^E n 



ctij-l 



We need to show that the integral of 111^=1 '^ij^ ^ finite for all a £ £/. The 
integration is made with respect to Wij, i \/ j < m, the free variables. For any 
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permutation matrices Pi and P2, the transformation W P1WP2 is a one to 
one transformation from W onto W, and the Jacobian, in absolute value, is equal 
to one. Therefore, it is sufficient to verify that the integral of 111^=1 ''^tj' ^ 
finite for all a G =!2^o, where ^0 = {a E am-im = Omm = !}• The idea is to 
decompose the multiple integral into m — 2 iterated integrals over the sections 
given by 

^ = {wij >0:iAj = k,i\/j<m, Wk+ = w+k = ^/m}, k = l,...,m-2, 
and 

l^m-l = {Wij > 0: i,j = m-l,m, Wm-1+ = W+rn-l = Wm+ = W+rn = l/m}. 

Here, the set Wi is fixed, the sets Wk are parameterized by {wij > 0: i A j < 
k, iV j = k}, = 2, . . . , m — 2, and '^m-i is parameterized by {wij > : i A j < 
m — 1, i\/ j = m — 1, m}. By Pubini's Theorem, for any nonnegative function /, 
we can write 

/ f{W)l\dWij=[ {■■■ [ {f{W)dWm-lm-l}---\ n 
•^^ i,j=l I J^ru-l J ,A,=l,iVj<m 

The next step consists in finding finite functions Cjt, k = l,...,m — l,on^, 
such that 

/ n n '^'^^i - 

iAj=k,i\/j<m iAj=k,iVj<m 

for all a G =2^^, uniformly on {wij > 0: i A j < k, j = k}, foi k = 1, . . . ,m — 2, 
and uniformly on {wij > 0: lAj < m—1, i\/ j = m — 1, m}, for k = m—1. This 
will give us that 

rn — l 

n n ^ n ^'^(")' 

i,j<m i,j<m k=l 

for all a G M)- 

Let a = V {J2e<m-i('^^ni - Wm-ie)} V {Ef<m-i(^m<? " Wim-i)} and 6 = 
1/m - {(X]£<m_i -w^m-i) V iJ2e<m-i Wm-ii)}- If a > 6, the set 'Wm-i is empty. 
Suppose that Wm-i is not empty and a G >e/o- Let 60 = 1/m — '}2e<m-i '^Im-i- 
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We have 



/. n 

■J ifm-\ ij=m-l,m 



dWm-1 



m—1 



< 



— 1 m — 1 1 



(6o - u) 



I,Q!m-l m-l+C*m m-l ^1 73/„ „ \ 

Oq ij{Oim-lm-l-,Oimm-l] 



^ 1 m— 1 ) C^m m— 1 ) ) 

= Cm-l{a). 



For A: = 1, . . . , m — 2 and a G we can take 

/ / m \ / m 

Ck{a) = ( I Ofcfc, ^ Oifc I + -B j a/cfc, ^ "fci 

i=k+i / \ i=fc+i / / r (Ej>fc a^fc) r (^Ej>fc 

The justification is given by Lemma |6] the fohowing Lemma. 
Lemma 6. // < a, 6 < 1, m > 3, q > 0, 

m—1 



/?j > 0, j = 1, . . . , m — 1, OTt/i 13 = Pj > I, 

m—1 

7j>0,i = l,...,m — 1, OTt/i 7 = 7i > 1, 



i=l 



and 



C = < 



Wii 



>0:iAj = l,i\/j<m, wij = a, = b> , 



i=l 



then 



P m m 

/"■n'li^fr'n-a-"' 



m—1 m—1 



dtOii dwij dWii < 

j=2 i=2 



(B (a, P) + B (a, 7)) ^.^^y^.^;^^ • 



Proof. Let 



aA6 



K{a,b,a,p,j) = I w^'-^a - wf-\b - w^-^dw. 
/o 
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If a < 6, then 

K{a,b,a,P,^) = [ w^'-^a-wf-^b-wy-^dw, 



< 6^-1 / w"-\a-wf-^dw, 
Jo 

= a''+^^^b'^-^B{a,/3) < B{a,P). 
In the same way, if 6 < a, then K{a, b, a, f3, 7) < B{a, 7), so that 

K{a,b,a,P,j) <B{a,p)+B{a,-f). (1) 
Now, let Wii be a random variable on (0, a Ab) with density 

^ w'^rHa-wnf-Hb-wnr-\ 



K{a,b,a,fi,j) 

let {U12, • • . , Uim) be a random vector distributed according to a Dirichlet(/3i, . . . , Pm-i] 
let {U21, . . • , Umi) be distributed according to a Dirichlet(7i, . . . , 7m-i)i and fur- 
ther assume independence between Wu, {U12, ■ ■ ■ , Uim), and {U21, • • • , Umi)- Let 
= {a-Wn)Uij, j = 2,...,m, and Wa = {b - Wn)Ua, i = 2,...,m. 
From this construction, given Wu = wu, we have that {W12, ■ ■ ■ ,Wim) and 
(W21, ■ ■ ■ ,Wmi) are conditionally independent with conditional densities given 
respectively by 



1 r(/3) . 



with wij > 0, j = 2, . . . , m, Yl2<j<m ""^ij = 0-^^11 and 



m — 1 ■ 



-1 



with > 0, i = 2, . . . , m, X]2<i<m'"^ii = b — wu. This construction together 
with inequality ([T| implies the result, namely 

„ m m m— 1 »n— 1 



j=2 i=2 jr=2 j=2 



(S(a,/3) + B(a,7)) 



nr=i^r(/3,)ni=1^r(7.) 
r(/?)r(7) 

□ 
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